Tasks: Pre-Baked Vagrant Box Workflow
Input: Design documents from /specs/009-vagrant-prebaked-boxes/
Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, quickstart.md
Tests: Not explicitly requested. Manual validation via demo scenario playbooks serves as functional test.
Organization: Tasks grouped by user story to enable independent implementation and testing.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (e.g., US1, US2, US3, US4)
- Include exact file paths in descriptions
Phase 1: Setup (Shared Infrastructure)
Purpose: Create directory structure, update gitignore and Makefile
- [X] T001 [P] Update
.gitignoreto adddemo/vagrant/boxes/pattern and updatedemo/vagrant/.gitignoreto addboxes/and*.boxpatterns - [X] T002 [P] Create
demo/vagrant/boxes/directory with a.gitkeepfile to preserve the directory in version control - [X] T003 [P] Add
demo-bakeanddemo-refreshtargets toMakefile(add to.PHONYline 14 and add target definitions after line 118, following existing cloud demo target pattern)
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Core infrastructure that MUST be complete before ANY user story can be implemented
CRITICAL: No user story work can begin until this phase is complete
- [X] T004 Modify
demo/vagrant/Vagrantfileto support conditional box selection: readENV['RCD_PREBAKED'], set per-nodevm.vm.boxto"rcd-cui-#{name}"when prebaked or'generic/rocky9'otherwise, and wrap the Ansible provisioner block (lines 50-62) inunless ENV['RCD_PREBAKED'] == '1'conditional - [X] T005 Create shared helper library
demo/scripts/lib-demo-common.shwith functions:log_info(),log_warn(),log_error(),detect_provider()(extracted from demo-setup.sh lines 39-76),generate_set_label()(format:rcd-demo-YYYYMMDD-NN),read_manifest(),write_manifest(),init_manifest(),get_current_set(),check_staleness()β all usingjqfor JSON operations ondemo/vagrant/boxes/manifest.jsonper the schema in data-model.md
Checkpoint: Foundation ready β user story implementation can now begin
Phase 3: User Story 1 β Package a Provisioned Cluster as Reusable Boxes (Priority: P1) MVP
Goal: Create demo-bake.sh that packages all 4 running VMs into .box files with manifest tracking and 2-set rotation.
Independent Test: Run demo-setup.sh (from scratch), then demo-bake.sh, verify 4 .box files exist in demo/vagrant/boxes/ and manifest.json is valid. Run demo-bake.sh --list and confirm output. Run demo-bake.sh --delete <label> and confirm cleanup.
Implementation for User Story 1
- [X] T006 [US1] Create
demo/scripts/demo-bake.shwith script header (set -euo pipefail, SCRIPT_DIR, REPO_ROOT, VAGRANT_CWD), sourcelib-demo-common.sh, and argument parsing: no args = bake,--list= list sets,--delete <label>= delete set,--delete-all= delete all,--help= usage - [X] T007 [US1] Implement
verify_cluster_running()function indemo/scripts/demo-bake.shthat checks all 4 VMs (mgmt01, login01, compute01, compute02) are running viavagrant statusand exits with clear error if not - [X] T008 [US1] Implement
package_vm()function indemo/scripts/demo-bake.shwith provider-specific branching: VirtualBox usesvagrant package <vm> --output <path>; libvirt usesvagrant package <vm> --output <path>withVAGRANT_LIBVIRT_VIRT_SYSPREP_OPERATIONS="defaults,-ssh-userdir,-ssh-hostkeys,-lvm-uuids"to preserve FreeIPA/Munge state; QEMU uses manual flow: halt VM, locate disk at.vagrant/machines/<vm>/qemu/vq_*/linked-box.img,qemu-img convert -O qcow2 -cto compress, createmetadata.jsonandVagrantfile,tar czfinto.box - [X] T009 [US1] Implement
bake_all()function indemo/scripts/demo-bake.shthat: checks disk space (warn if < 20 GB free), generates a set label viagenerate_set_label(), callspackage_vm()for each of the 4 VMs, performs 2-set rotation (deleteprevious, relabelcurrenttoprevious, set new ascurrent), writes manifest, and prints summary with total size - [X] T010 [US1] Implement
list_sets()function indemo/scripts/demo-bake.shthat reads manifest and prints a formatted table: Label, Created, Provider, Age, Commit, Status (current/previous), Total Size - [X] T011 [US1] Implement
delete_set()anddelete_all()functions indemo/scripts/demo-bake.shthat remove box files fromdemo/vagrant/boxes/, deregister from Vagrant viavagrant box remove, update manifest, and report reclaimed disk space - [X] T012 [US1] Add
traphandler indemo/scripts/demo-bake.shfor SIGINT/SIGTERM that cleans up any partially created.boxfiles in progress (track current packaging file in a variable, remove on interrupt)
Checkpoint: demo-bake.sh is fully functional. Can package, list, and delete box sets. make demo-bake works.
Phase 4: User Story 2 β Boot a Demo Cluster from Pre-Baked Boxes (Priority: P1)
Goal: Modify demo-setup.sh to detect baked boxes, offer to use them (skipping provisioning), run post-restore playbook, create baseline snapshot, and prompt to bake after fresh provisions.
Independent Test: With baked boxes present, run demo-setup.sh, accept baked boot, verify all services operational and demo scenario A passes. With no boxes, verify normal provisioning flow is unchanged.
Depends on: User Story 1 (T006-T012) β baked boxes must exist to test boot
Implementation for User Story 2
- [X] T013 [US2] Add
check_baked_boxes()function todemo/scripts/demo-setup.shthat sourceslib-demo-common.sh, reads the manifest, checks if acurrentbox set exists, verifies provider matches detected provider (exit with clear error on mismatch), compares storedvagrant_versionwith currentvagrant --version(warn on mismatch), checks staleness (warn if >DEMO_STALE_DAYS/ default 7), and returns 0 if usable boxes found - [X] T014 [US2] Add baked-box prompt logic to
demo/scripts/demo-setup.sh(after prerequisite checks, ~line 195): ifcheck_baked_boxesreturns 0, display set info (label, age, commit) and prompt "Use baked boxes? [Y/n]"; if user accepts, setUSE_BAKED=true - [X] T015 [US2] Implement baked-boot flow in
demo/scripts/demo-setup.sh: whenUSE_BAKED=true, runvagrant box add --force --name rcd-cui-<vm> <box-path>for each VM, setRCD_PREBAKED=1env var, runvagrant up --no-provision --provider <provider>, skip all Ansible provisioning - [X] T016 [US2] Add post-restore service reconciliation and health check to
demo/scripts/demo-setup.sh: whenUSE_BAKED=true, after VMs are running, executeansible-playbookwithdemo/playbooks/post-restore.ymlusing the appropriate inventory (QEMU uses runtime inventory, others use static), then verify critical services are operational (FreeIPA, slurmctld, wazuh-manager, nfs-server on mgmt01; slurmd on compute nodes; munge and chronyd on all nodes) via SSH service checks, then create baseline snapshot viavagrant snapshot push baseline - [X] T017 [US2] Add auto-bake prompt to
demo/scripts/demo-setup.shafter successful fresh provision (after baseline snapshot, ~line 225): ifDEMO_USE_BAKEDis not0, prompt "Bake this cluster for future fast starts? [Y/n]" and invokedemo-bake.shif confirmed - [X] T018 [US2] Ensure backward compatibility in
demo/scripts/demo-setup.sh: when no baked boxes exist andDEMO_USE_BAKEDis unset, the entire flow runs identically to the pre-feature behavior with no visible changes to the user
Checkpoint: Full bake-then-boot cycle works. Boot from baked boxes in < 5 min. Demo scenario A passes against baked-boot cluster. Fresh provision flow unchanged.
Phase 5: User Story 3 β Rebuild Baked Boxes from Current Codebase (Priority: P2)
Goal: Create demo-refresh.sh for single-command destroy-provision-bake cycle.
Independent Test: Run demo-refresh.sh, verify VMs are destroyed, reprovisioned, baked, and new manifest reflects current commit.
Depends on: User Story 1 (bake functionality), User Story 2 (boot from baked)
Implementation for User Story 3
- [X] T019 [US3] Create
demo/scripts/demo-refresh.shthat: sourceslib-demo-common.sh, destroys existing VMs (vagrant destroy -f), preserves the current baked box set as safety net (do not delete until new bake succeeds), runsdemo-setup.shwithDEMO_USE_BAKED=0(force fresh), on success callsdemo-bake.shto create new boxes, on failure exits non-zero while preserving previous box set and printing clear error message
Checkpoint: demo-refresh.sh completes full destroy-provision-bake cycle. make demo-refresh works. Previous boxes preserved on failure.
Phase 6: User Story 4 β Override Baked Box Behavior via Environment Variable (Priority: P3)
Goal: Ensure DEMO_USE_BAKED env var provides deterministic non-interactive control.
Independent Test: Run with DEMO_USE_BAKED=1 (boxes present β boots silently; no boxes β error). Run with DEMO_USE_BAKED=0 (boxes present β provisions from scratch silently, no auto-bake prompt).
Depends on: User Story 2 (prompt logic already in place)
Implementation for User Story 4
- [X] T020 [US4] Implement
DEMO_USE_BAKED=1logic indemo/scripts/demo-setup.sh: skip all prompts, force baked-box boot; if no boxes exist, exit with error message: "No baked boxes found. Run './demo/scripts/demo-bake.sh' after a successful provision to create them." - [X] T021 [US4] Implement
DEMO_USE_BAKED=0logic indemo/scripts/demo-setup.sh: skip all prompts, force fresh provisioning, suppress auto-bake prompt after provision (FR-017 suppression), ignore available boxes entirely
Checkpoint: DEMO_USE_BAKED=1 ./demo/scripts/demo-setup.sh and DEMO_USE_BAKED=0 ./demo/scripts/demo-setup.sh both behave deterministically without prompts.
Phase 7: Polish & Cross-Cutting Concerns
Purpose: Documentation, compatibility validation, edge case handling
- [X] T022 Verify
demo/scripts/demo-reset.shworks correctly with clusters booted from baked boxes: confirmvagrant snapshot pop baselineandvagrant snapshot push baselinefunction identically to fresh-provisioned clusters - [X] T023 [P] Run quickstart.md validation: execute Workflow A (first-time bake), Workflow B (fast demo start), and Workflow D (non-interactive) from
specs/009-vagrant-prebaked-boxes/quickstart.mdand document results - [X] T024 [P] Add QEMU best-effort limitations documentation to
demo/README.md: document that QEMU baking uses raw disk export (not nativevagrant package), may have larger box files, and requiresqemu-imgandjqas additional prerequisites
Dependencies & Execution Order
Phase Dependencies
- Setup (Phase 1): No dependencies β can start immediately
- Foundational (Phase 2): Depends on Setup completion β BLOCKS all user stories
- User Story 1 (Phase 3): Depends on Foundational β can start after T004-T005
- User Story 2 (Phase 4): Depends on Foundational AND User Story 1 β needs baked boxes to test boot
- User Story 3 (Phase 5): Depends on User Story 1 + User Story 2 β uses both bake and boot
- User Story 4 (Phase 6): Depends on User Story 2 β refines prompt/override logic
- Polish (Phase 7): Depends on all user stories being complete
User Story Dependencies
- User Story 1 (P1): Can start after Foundational (Phase 2) β No dependencies on other stories
- User Story 2 (P1): Depends on User Story 1 (needs baked boxes to exist for testing boot flow)
- User Story 3 (P2): Depends on User Story 1 + 2 (uses bake and boot in sequence)
- User Story 4 (P3): Depends on User Story 2 (adds override logic to existing prompt flow)
Parallel Opportunities
- All Phase 1 tasks (T001, T002, T003) can run in parallel
- Phase 2 tasks T004 and T005 touch different files β could run in parallel but T005 (helper lib) should complete first since T004 (Vagrantfile) is simpler
- Within US1: T010 and T011 (list and delete) can run in parallel after T009 (manifest/rotation)
- T022, T023, T024 (Polish) can all run in parallel
Parallel Example: User Story 1
# After Foundational phase complete:
# Sequential (core packaging must be built first):
Task T006: "Create demo-bake.sh argument parsing"
Task T007: "Implement verify_cluster_running()"
Task T008: "Implement package_vm() with provider branching"
Task T009: "Implement bake_all() with rotation"
# Then in parallel (independent operations on same file, different functions):
Task T010: "Implement list_sets()" # reads manifest only
Task T011: "Implement delete operations" # deletes from manifest
# Then:
Task T012: "Add interrupt/cleanup handler"
Implementation Strategy
MVP First (User Story 1 Only)
- Complete Phase 1: Setup (T001-T003)
- Complete Phase 2: Foundational (T004-T005)
- Complete Phase 3: User Story 1 (T006-T012)
- STOP and VALIDATE: Bake a cluster, verify 4
.boxfiles and valid manifest - This alone delivers value: boxes are ready for manual use even before US2
Incremental Delivery
- Setup + Foundational β Infrastructure ready
- User Story 1 β Can bake boxes β Validate independently
- User Story 2 β Can boot from baked boxes in < 5 min β Validate independently
- User Story 3 β Single-command refresh cycle β Validate independently
- User Story 4 β CI/automation support β Validate independently
- Polish β Documentation and compatibility validation
Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- US1 and US2 are both P1 but must be sequential (need boxes before you can boot from them)
- US3 and US4 can theoretically run in parallel since they modify different files
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- All scripts should follow existing patterns in
demo/scripts/demo-setup.sh(error codes, logging, provider detection)